ActivityNet Challenge 2017 Summary
نویسندگان
چکیده
Most of traditional video action recognition methods are based on trimmed videos, which is only one action in one video. But most of videos in real world is untrimmed. In order to overcome the difficulty in some extent, we propose a method based on fusion of multiple features for untrimmed video classification task of ActivityNet challenge 2017. We use the CNN features, MBH features and stacked C3D features for classification. Then, we use one-vs-rest linear SVM to construct classifier respectively. Finally, we fuse the three results by voting to get the final
منابع مشابه
Temporal Convolution Based Action Proposal: Submission to ActivityNet 2017
In this notebook paper, we describe our approach in the submission to the temporal action proposal (task 3) and temporal action localization (task 4) of ActivityNet Challenge hosted at CVPR 2017. Since the accuracy in action classification task is already very high (nearly 90% in ActivityNet dataset), we believe that the main bottleneck for temporal action localization is the quality of action ...
متن کاملUC Merced Submission to the ActivityNet Challenge 2016
This notebook paper describes our system for the untrimmed classification task in the ActivityNet challenge 2016. We investigate multiple state-of-the-art approaches for action recognition in long, untrimmed videos. We exploit hand-crafted motion boundary histogram features as well feature activations from deep networks such as VGG16, GoogLeNet, and C3D. These features are separately fed to lin...
متن کاملCUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of very deep two-stream CNN [16] and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g. ResNet and Inception V3 and introduce a new aggregation scheme (top-k ...
متن کاملTemporal Segment Networks for Action Recognition in Videos
Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based...
متن کاملAttention Clusters: Purely Attention Based Local Feature Integration for Video Classification
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.08011 شماره
صفحات -
تاریخ انتشار 2017